Multitask Sequence-to-Sequence Models for Grapheme-to-Phoneme Conversion

نویسندگان

Benjamin Milde

Christoph Schmidt

Joachim Köhler

چکیده

Recently, neural sequence-to-sequence (Seq2Seq) models have been applied to the problem of grapheme-to-phoneme (G2P) conversion. These models offer a straightforward way of modeling the conversion by jointly learning the alignment and translation of input to output tokens in an end-to-end fashion. However, until now this approach did not show improved error rates on its own compared to traditional joint-sequence based n-gram models for G2P. In this paper, we investigate how multitask learning can improve the performance of Seq2Seq G2P models. A single Seq2Seq model is trained on multiple phoneme lexicon datasets containing multiple languages and phonetic alphabets. Although multi-language learning does not show improved error rates, combining standard datasets and crawled data with different phonetic alphabets of the same language shows promising error reductions on English and German Seq2Seq G2P conversion. Finally, combining Seq2seq G2P models with standard n-grams based models yields significant improvements over using either model alone.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Joint-sequence models for grapheme-to-phoneme conversion

Grapheme-to-phoneme conversion is the task of finding the pronunciation of a word given its written form. It has important applications in text-to-speech and speech recognition. Joint-sequence models are a simple and theoretically stringent probabilistic framework that is applicable to this problem. This article provides a selfcontained and detailed description of this method. We present a nove...

متن کامل

Massively Multilingual Neural Grapheme-to-Phoneme Conversion

Grapheme-to-phoneme conversion (g2p) is necessary for text-to-speech and automatic speech recognition systems. Most g2p systems are monolingual: they require language-specific data or handcrafting of rules. Such systems are difficult to extend to low resource languages, for which data and handcrafted rules are not available. As an alternative, we present a neural sequence-to-sequence approach t...

متن کامل

Improving Phoneme Sequence Recognition using Phoneme Duration Information in DNN-HSMM

Improving phoneme recognition has attracted the attention of many researchers due to its applications in various fields of speech processing. Recent research achievements show that using deep neural network (DNN) in speech recognition systems significantly improves the performance of these systems. There are two phases in DNN-based phoneme recognition systems including training and testing. Mos...

متن کامل

A latent analogy framework for grapheme-to-phoneme conversion

Data-driven grapheme-to-phoneme conversion involves either (top-down) inductive learning or (bottom-up) pronunciation by analogy. As both approaches rely on local context information, they typically require some external linguistic knowledge, e.g., individual grapheme/phoneme correspondences. To avoid such supervision, this paper proposes an alternative solution, dubbed pronunciation by latent ...

متن کامل

Read, Attend and Pronounce: An Attention-Based Approach for Grapheme-To-Phoneme Conversion

We propose an attention-enabled encoder-decoder model for the problem of grapheme-to-phoneme conversion. Most previous work has tackled the problem via joint sequence models that require explicit alignments for training. In contrast, the attentionenabled encoder-decoder model allows for jointly learning to align and convert characters to phonemes. With this approach, we achieve state-of-the-art...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2017

Multitask Sequence-to-Sequence Models for Grapheme-to-Phoneme Conversion

نویسندگان

چکیده

منابع مشابه

Joint-sequence models for grapheme-to-phoneme conversion

Massively Multilingual Neural Grapheme-to-Phoneme Conversion

Improving Phoneme Sequence Recognition using Phoneme Duration Information in DNN-HSMM

A latent analogy framework for grapheme-to-phoneme conversion

Read, Attend and Pronounce: An Attention-Based Approach for Grapheme-To-Phoneme Conversion

عنوان ژورنال:

اشتراک گذاری